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ITEM  #20,  CONTINUED:  decomposition  and  representation  theorems  for  semi¬ 
martingales,  formulas  for  absolutely  continuous  change  of  probability 
measure  (e.g.,  the  Girsanov  formula),  and  the  study  of  Ito-sense  stochastic 
differential  equations  with  discontinuous  coefficients.  It  seems  fair  to 
say  that  these  developments  in  stochastic  processes  were  in  turn  to  an 
extent  influenced  by  their  applications  in  stochastic  control.  For  con¬ 
trolled  Markov  diffusion  processes,  there  is  a  direct  connection  with  certain 
nonlinear  partial  differential  equations  via  the  dynamic  programming  equa¬ 
tion.  These  equations  are  of  second  order,  elliptic  or  parabolic,  and 
possibly  degenerate.  Stochastic  control  gives  a  way  to  represent  their 
solutions  probabilistically.  There  is  an  unforeseen  connection  with  differ¬ 
ential  geometry  via  the  Monge-Ampere  equation. 

X'S*  \>  t 

Broadly  spAaking,  stochastic  control  theory  deals  with  models  of  systems 
whose  evolution  is  affected  both  by  certain  random  influences  and  also  by 
certain  inputs  chosen  by  a  •"controller'^'"  The  authors  are  concerned  here 
only  witM  state-space  formulations  of  control  problems  in  continuous  time. 
Moreover,  the  authors  consider  only  markovian  control  problems  in  which  the 
state /STS  of  the  process  being  controlled  is  Markov  provided  the  controller 
followb-a  Markov  control  policy^  The  authors  shall  not  discuss  at  all  the 
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extensive  engineering  literature  on  input-output  formulations  particularly 
for  linear  system  models,  see  Astrom  [1]. 

7 TU/  U.  S-i* 

authors  shall- mainly  discuss  hne  case  of  continuously  actiTng  control,  in 
which  at  each  time  t  a  control (u\  is  applied  to  the  system. <  However,  in  #8 
the  authors  briefly  mention  impttfsive  control  problems,  in  which  control  is 
applied  only  at  discrete  time  instants.  In  optimal  stochastic  control 
theory  the  goal  is  to  minimize  (or  maximize)  some  criterion  depending  on  the 
states  and  controls  u^  during  some  finite  or  infinite  time  interval.  In 
#2  the  authors  formulate  a  class  of  optimal  control  problems  for  Markov 
processes,  with  criterion  (2.2)  to  be  minimized.  The  distinction  between 
problems  in  which  x^  is  known  to  the  controller,  and  problems  with  partial 
observations  is  made  there.  When  x.  is  known,  the  dynamic  programming  method 
can  be  used.  In  principle,  this  method  leads  directly  to  an  optimal  Markov 
control  policy,  al though  it  rarely  gives  the  optimal  policy  explicitly.  In 
#3,  both  analytical  and  probabilistic  approaches  are  indicated.  Associated 
with  dynamic  programming  is  the  Nisio  nonlinear  semigroup  (#4).  In  #5  a 
logarithmic  transformation  is  applied  to  positive  solutions  of  the  backward 
equation  of  a  Markov  process.  There  results  a  controlled  Markov  process, 
leading  to  connections  between  stochastic  control  and  such  topics  as 
stochastic  mechanics,  large  deviations  and  nonlinear  filtering.  The  case  of 
controlled,  partially  observed  processes  is  mentioned  in  #7,  along  with 
adaptive  control  of  Markov  processes.  Finally,  in  #9,  the  authors  indicate 
a  few  of  the  various  difficulties  encountered  in  seeking  to  implement  in 
engineering  applications  the  mathematically  sophisticated  results  of  the 
theory,  and  mention  some  newer  areas  of  application. 
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OPTIMAL  CONTROL  Of  MARKOV  PROCESSES  r 
Wendell  H.  Fleming 

1.  Introduction.  The  purpose  of  this  article  Is  to  give  an 
overview  of  some  recent  developments  In  optimal  stochastic  control 
theory.  The  field  has  expanded  a  great  deal  during  the  last  20 
years.  It  Is  not  possible  In  this  overview  to  go  deeply  Into  any 
topic,  and  a  number  of  Interesting  topics  have  been  omitted  entirely. 
The  list  of  References  Includes  several  books,  conference  proceedings 
and  survey  articles. 

The  development  of  stochastic  control  theory  has  depended  on 
parallel  advances  In  the  theory  of  stochastic  processes  and  on  certain 
topics  In  partial  differential  equations.  On  the  probabilistic  side 
one  can  mention  decomposition  and  representation  theorems  for  semi- 
martingales,  formulas  for  absolutely  continuous  change  of  probability 
measure  (e.g.  the  Glrsanov  formula),  and  the  study  of  Ito-sense 
stochastic  differential  equations  with  discontinuous  coefficients. 

It  seems  fair  to  sey  that  these  developments  In  stochastic  processes 
were  In  turn  to  an  extent  Influenced  by  their  applications  In 
stochastic  control.  For  controlled  Markov  diffusion  processes ,  thorp 
Is  a  direct  connection  with  certain  nonlinear  partial  differential 
equations  via  the  dynamic  programming  equation.  These  equations  are 
of  second  order,  elliptic  or  parabolic,  and  possibly  degenerate, 

i  •  •  *  *  ■  ,  '  *  r-  .  ••• 
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Stochastic  control  gives  a  way  to  represent  their  solutions  probabll-  ’ 
Istlcally.  There  Is  an  unforeseen  connection  with  differential  geom¬ 
etry  via  the  Monge-Ampere  equation. 

Broadly  speaking,  stochastic  control  theory  deals  with  models  of 
systems  whose  evolution  is  affected  both  by  certain  random  influences 
and  also  by  certain  inputs  chosen  by  a  "controller".  Me  are  concerned 
here  only  with  state-space  formulations  of  control  problems  in  contin¬ 
uous  time.  Moreover,  we  consider  only  markovian  control  problems  in 
which  the  state  xt  of  the  process  being  controlled  is  Markov  provided 
the  controller  follows  a  Markov  control  policy.  We  shall  not  discuss  at 
all  the  extensive  engineering  literature  on  input-output  formulations 
particularly  for  linear  system  models,  see  Jistrom  [1]. 

We  shall  mainly  discuss  the  case  of  continuously  acting  control, 
in  which  at  each  time  t  a  control  u^  is  applied  to  the  system. 
However,  in  §8  we  briefly  mention  impulsive  control  problems,  in  which 
control  is  applied  only  at  discrete  time  instants.  In  optimal  stochas¬ 
tic  control  theory  the  goal  is  to  minimize  (or  maximize)  some  criterion 
depending  on  the  states  x^  and  controls  u^.  during  some  finite  or 
infinite  time  interval.  In  §2  we  formulate  a  class  of  optimal  control 
problems  for  Markov  processes,  with  criterion  (2.2)  to  be  minimized. 

The  distinction  between  problems  in  which  xt  is  known  to  the  con¬ 
troller,  and  problems  with  partial  observations  is  made  there.  When 
xt  is  known,  the  dynamic  programming  method  can  be  used.  In  principle, 
this  method  leads  directly  to  an  optimal  Markov  control  policy,  although 
it  rarely  gives  the  optimal  policy  explicitly.  In  13,  both  analytical 


and  probabilistic  approaches  are  Indicated.  Associated  with  dynamic 
programing  Is  the  Klslo  nonlinear  semigroup  (14).  In  §5  we  discuss 

methods  of  approximate  solution  and  special  problems.  In  S6  a  logarlth- 

* 

mlc  transformation  Is  applied  to  positive  solutions  of  the  backward 
equation  of  a  Markov  process.  There  results  a  controlled  Markov  process 
leading  to  connections  between  stochastic  control  and  such  topics  as 
stochastic  mechanics,  large  deviations  and  nonlinear  filtering.  The 
case  of  controlled,  partially  observed  processes  Is  mentioned  In  17, 
along  with  adaptive  control  of  Markov  processes.  Finally  In  §9  we  In¬ 
dicate  a  few  of  the  various  difficulties  encountered  In  seeking  to  Im¬ 
plement  in  engineering  applications  the  mathematically  sophisticated 
results  of  the  theory,  and  mention  some  newer  areas  of  application. 

2.  Controlled  Markov  processes.  We  consider  optimal  stochastic  control 
problems  of  the  following  kind.  We  are  given  metric  spaces  E,  U 
called  the  state  space  and  control  space,  respectively.  For  each  fixed 
u€  U  there  Is  a  linear  operator  Lu  which  generates  a  Markov,  Feller 
process  with  state  space  Z.  The  domain  of  lu  contains,  for  each 
ue  U,  a  set  D  dense  In  the  space  C(E)  of  bounded  uniformly  contin¬ 
uous  functions  on  Z.  The  state  and  control  processes  xt,ut  are 
defined  on  some  probability  space  (ft.^P).  The  E-valued  process  xt 
Is  adapted  to  some  Increasing  family  of  o-algebras  and  the 

trajectories  x  are  right  continuous.  The 
U-valued  process  ut  Is  predictable  with  respect  to  an  Increasing 
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family  of  o-algebras  e^.  The  a-algebraJJ?  describes  in  a  measure 
theoretic  way  the  information  available  to  the  controller  at  time  t. 
The  processes  (xt,ut)  are  related  by  the  requirement  that 

ft  u. 

(2.1)  Mg(t)  =  g(xt)  -  g(xQ)  -  j  L  sg(x$)ds 

is  a  (-^»P)  martingale  for  every  g€D.  We  consider  a  fixed,  finite 
time  interval  0  £  t  <_  T,  and  the  objective  to  minimize  a  criterion  of 
the  form  of  an  expectation 

(2.2)  J  =  E{j*  k(xt,ut)dt  +  G(xT)}. 


Example  1.  Controlled  finite-state  Markov  chain,  with 
I  =  {1,2,  ...  ,N).  In  this  case  Lu  is  identified  with  the  infinites¬ 
imal  matrix  (qV.)  of  the  chain.  When  the  control  is  applied, 

J  ut 

the  jumping  rate  of  xt  from  state  i  to  j  is  q^. 

Example  2.  Controlled  diffusion  process  with  E  =  Fn, 

(2.3)  xt  »  x0  +  {'*  f(xs,us)ds  +  jQ  o(x$,us)dws. 


with  w^  a  brownian  motion  (of  some  dimension  d)  Independent  of  the 
initial  state  Xq.  In  this  case 

(2.4) 

with  a  «  oo'  and  0  «{g  :  g,gv  ,gy  w€  C0Rn),  i,j  *  1,  ...  ,n}. 

X1  x1xj 

The  diffusion  is  called  nondegenerate  if  the  eigenvalues  of  a(x,u) 


are  bounded  below  by  c  >  Q.  ■  -  •>—' 

Further  assumptions,  which  very  from  author  to  author  in  the  v 
literature,  need  to  he  made.  To  avoid  undue  complication,  >ta  thedis- 
cussien  to  follow  we  take  a  compact  control  space  U,  and  k(x,u),  6<x) 
hounded,  uniformly  continuous.  In  (2.3), f(x,u) ,o(x,u)  are  hounded  end 
as  smooth  as  necessary.  The  o-algebras  are  right  continuous 

and  completed. 

If  Is  ^-measurable,  then  the  controller  can  observe  the 
state  Xj..  In  this  case,  one  may  as  well  take  and  known 

initial  state  xQ.  This  is  the  situation  in  §'s  3-6  to  follow.  If 
(2.1)  holds,  we  call 

<*L  *  (n,#P,t$p,x,,u.) 

an  admissible  system  for  the  control  problem  with  completely  observed 
states. 

A  Markov  control  policy  is  a  Bore!  measurable  function  from 
[O.TJxi  into  U.  An  admissible  system  «t  is  obtained  via  a  Markov 
control  policy  u  If 

(2.5)  ut  »  u(t,xj). 

Given  u  and  Xq€  E,  one  would  like  to  know  whether  a  corres¬ 
ponding  admissible  system  exists,  with  xt  a  Martov  process.  Under 
sufficiently  strong  restrictions  this  is  well  known,  for  instancy  in 
case  of  controlled  diffusions  a  Lipschitz  condition  on  u(t,x>  would 
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imply  the  classical  Ito  conditions.  For  nondegenerate  controlled 

diffusions,  existence  follows  from  Krylov  [8,  p.87]  for  any  bounded  iu. 

The  Markov  property  of  can  be  obtained  under  stronger  hypotheses. 

For  instance,  for  nondegenerate  diffusions  it  holds  if  in  (2.3)  a*o(x). 

A  martingale  method  for  obtaining  the  Markov  property  is  to  show  that 

the  probability  distribution  pjf-  of  the  state  trajectory  x  is 

x0 

unique  and  depends  continuously  on  the  initial  state  xQ  159).  In  gen¬ 
eral  xt  is  only  a  weak-sense  solution  to  (2.3),  since  neither  the 
probability  space  nor  the  brownian  motion  wt  are  given  in  advance. 
However,  in  the  nondegenerate  case  with  a  -  a(x)  a  result  of 
Veretennikov  (62)  gives  a  strong  solution. 

3.  Dynamic  programming.  The  dynamic  programming  approach  to  the 
control  problem  with  completely  observed  states  x^  can  be  described 
in  a  purely  formal  way,  as  follows.  For  initial  state  Xq6Z  and 
admissible  system  o(,  write  J  s  J(T,x0,c()  in  (2.2).  Let 

(3.1)  W(T,xQ)  =  inf  J(T,x0,aj). 

Formal  reasoning  indicates  that  W(T,x)  should  satisfy  the  dynamic 
programming  equation 

(3.2)  |y  *  AW,  T  >  0, 
with  initial  data  W(0,x)  *  G(x),  where 
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(3.3)  Ag(x)  =  min[Lug(x)  +  k(x,u)]. 

UCU 

*  ★ 

Formally,  an  optimal  Markov  policy  iu  is  found  by  requiring  u^  (t,x) 

to  minimize  LUW(T  -  t,x)  +  k(x,u)  among  all  u£  U.  Instead  of  the 

finite  time  control  problem,  control  until  xt  exits  a  given  open  set 

^ci  can  be  considered.  In  that  case  the  dynamic  programming  equation 

becomes  the  autonomous  form  of  (3.2)  in  with  W(x)  *  G(x)for  x€  3^. 

There  are  also  autonomous  dynamic  programming  equations  associated  with 

the  infinite  time  control  problem,  with  discounted  cost  or  average  cost 

per  unit  time  criteria  to  be  minimized. 

In  the  rigorous  mathematical  treatment  of  dynamic  programming  there 

is  one  easy  result, the  so-called  Verification  Theorem  [7,  p.  159]. 

Roughly  speaking,  it  states  that  if  W(T,x)  satisfying  (3.2)  with  the 

★ 

initial  data  and  the  associated  Markov  policy  iu  are  both  "sufficiently 
* 

regular",  then  jj  is  indeed  optimal  and  W(T,x)  is  the  minimum  per¬ 
formance  in  (3.1).  The  Verification  Theorem  is  used  to  obtain  explicit 
solutions,  in  those  cases  where  such  a  solution  is  known.  Much  more 

difficult  are  the  questions  of  existence  of  sufficiently  regular  W  and 
★ 

,  and  there  is  a  large  literature  dealing  with  various  aspects  of  them. 
One  approach  is  analytical  with  the  stochastic  interpretation  made  after¬ 
ward.  In  this  approach,  existence  of  solutions  to  the  dynamic  programming 
equation  and  their  regularity  properties  are  studied,  using  non-probabi 1 - 
istic  methods.  It  is  then  proved  that  optimal  (or  at  least  e-optimal) 
Markov  control  policies  exist.  A  second  approach  is  probabilistic.  In 
this  approach,  one  starts  with  the  minimum  cost  function  W  in  (3.1) 
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and  develops  stochastic  counterparts  to  the  dynamic  programming  conditions 
for  a  minimum.  A  third  approach  Is  to  consider  an  associated  nonlinear 
semigroup  (§4).  While  this  approach  leads  to  fewer  technical  diffisulties 
than  either  of  the  other  two,  it  also  leads  to  weaker  results. 

For  controlled  diffusions  the  analytical  approach  is  remarkably  well 
developed.  See  Krylov  [8],  Lions  [45].  In  the  nondegenerate  case  the 
dynamic  programming  equation  is  a  second  order  nonlinear  partial  differ¬ 
ential  equation  of  parabolic  type  also  called  a  Hamilton- Jacobi -Bellman 
equation.  In  various  other  formulations,  with  xt  controlled  for  all 
time  t  ^  0  or  until  exit  from  an  open  set  the  Hamilton-Jacobi- 
Bellman  equation  is  elliptic  rather  than  parabolic.  Under  reasonable 
assumptions  the  problem,  the  solution  S  has  generalized  second  deriva¬ 
tives  which  are  locally  bounded.  In  the  elliptic  case  a  deeper  regular¬ 
ity  result  of  Evans  [26] [60]  gives  a  classical  solution.  In  the  degen¬ 
erate  case  W  is  less  regular  with  locally  bounded  generalized  first 

derivatives  Wv  .  The  dynamic  programming  equation  (3.2),  suitably 
xi 

interpreted  in  terms  of  Schwartz  distributions,  still  holds  [8J[45]. 

For  the  case  of  controlled  jump  Markov  processes,  results  on  existence, 
uniqueness  and  regularity  of  solutions  to  (3.2)  were  obtained  by 
Pragarauskas  [52]. 

A  large  class  of  nonlinear  elliptic  or  parabolic  equations,  satis¬ 
fying  approprate  convexity  conditions,  can  be  represented  as  Hamilton- 
Jacobi -Bellman  equations.  As  Gaveau  [35]  pointed  out,  the  Monge-Ampere 
equation  has  such  a  representation. 


In  the  probabilistic  approach,  the  starting  point  Is  to  rewrite 
the  dynamic  programming  principle  In  the  following  martingale  form. 
Given  an  admissible  system  let 

mt  *  |  k(x$*u$)ds  +  '  t,xt^* 

Then  m^  Is  a  (J £,P)  submartingale,  and  Is  optimal  if  and  only 
If  mt  Is  a  (J{,P)  martingale.  With  the  aid  of  the  Ooob-Heyer  decom¬ 
position  for  submartingales  and  some  martingale  representation  theorems 
conditions  for  optimality  are  obtained.  See  81  smut  [21],  Davis  [16], 
Elliott  [25],  El  Karoul  [53.  These  conditions  are  probabilistic  coun¬ 
terparts  of  those  expressed  analytically  by  the  dynamic  programming 
equation  (3.2).  With  the  probabilistic  approach  difficult  questions 
of  regularity  of  solutions  to  (3.2)  are  avoided.  The  probabilistic 
techniques  give  results  about  existence  of  optimal  Markov  policies  [21] 
[5,p.  218].  These  methods  also  give  conditions  for  a  minimum  for 
optimal  control  under  partial  observations. 

A  different  kind  of  Markovian  control  problem  for  diffusions.  In 
which  the  control  acts  only  on  the  boundary  of  a  region Fn  was 
considered  by  Vermes  [61]. 

4.  The  Mlslo  nonlinear  semigroup.  The  dynamic  programing  prin¬ 
ciple  can  be  restated  In  another  form,  in  terms  of  a  semigroup  of  non¬ 
linear  operators.  In  purely  formal  way,  this  Is  done  as  follows.  In 
(2.2)  we  fix  k  but  consider  various  G.  We  rewrite  the  Infimum  In 
(3.1)  as  W(T,x)  *  SjG(x).  The  dynamic  programming  principle  1$ 
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formally  equivalent  to  the  semigroup  property 

(4.1)  Sj  *  S-j.  o  Sj 

Tl+T2  T1  l2 

of  the  family  {Sj}  of  nonlinear  operators.  In  addition,  for 
"sufficiently  regular"  6,  one  should  have 

(4*2)  3T  stg 1 t=o  “  A6‘ 

This  formal  procedure  was  put  on  a  rigorous  basis  by  Nlslo  [10],  who 
obtained  (Sj.)  as  a  semigroup  on  the  space  C(e)  and  showed  under  some 
mild  additional  conditions  that  (4.2)  holds  for  G£D  (notation  of  s2). 
Equations  (4.1), (4. 2}  would  imply  the  dynamic  programming  equation  (3.2) 
If  we  knew  that  W(T,0  *  S*G  is  sufficiently  regular  (in  particular. 

If  Sj.  maps  0  into  0.)  However,  W  does  not  generally  have  the 
desired  regularity.  In  such  Instances  (4.2)  Is  a  kind  of  weaker  sub¬ 
stitute  for  (3.2). 

Nlslo's  treatment  Is  analytical.  She  obtains  Sj  as  the  lower 
envelope  of  the  family  of  linear  semigroups  S^,  where  for  constant 
control  u€U  the  generator  of  Sj  coincides  on  D  with  the  operator 
Lu  +  k(*,u).  A  stochastic  treatment  of  the  Nlslo  semigroup  is  given  in 
Bensoussan-Llons  [2],  and  a  uniqueness  result  In  case  of  nondegenerate 
diffusions  In  Nlslo  [51].  El  Karoui,  Lepeltler,  and  Marchal  [24]  used 
another  procedure,  and  obtained  a  nonlinear  semigroup  on  a  larger  space 
of  bounded  functions  G  which  are  measurable  in  a  suitable  sense. 
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5.  Explicit  and  approximate  solutions.  In  a  few  Instances  the 
dynamic  programming  equation  (3.2)  can  be  solved  explicitly.  Examples 
are  the  well  known  stochastic  linear  regulator  and  Merton's  optimal 
portfolio  selection  problem  [7.  pp.  160*165].  For  other  special  prob¬ 
lems  the  solution  can  be  reduced  to  a  free  boundary  problem.  The 
boundaries  to  be  determined  separate  regions  where  some  control  con¬ 
straint  holds  or  not.  See  for  example  Karatzas-Benes  [40]. 

When  a  solution  cannot  be  found  by  special  methods,  one  can  seek 
an  approximate  solution  to  (3.2).  One  class  of  approximate  methods 
Involve  discretizations  of  (3.2).  Among  such  methods  the  algorithm  of 
Kushner  [9]  has  a  natural  stochastic  control  Interpretation.  The 
difference  equations  associated  with  the  algorithm  correspond  to  the 
dynamic  programming  equation  for  an  approximating  controlled  Markov 
chain.  For  the  special  case  of  controlled  one-dimensional  diffusions, 
Borkar  and  Varlaya  [22]  used  a  procedure  In  which  piece-wise-constant 
approximating  Markov  control  policies  are  allowed. 

Other  results  give  approximate  solutions  to  (3.2)  when  the  state 
process  xt  Is  a  nearly-determlnlstlc  controlled  diffusion.  In  (2.3) 
let  o  *  t*o.  The  solution  Is  sought  In  the  form  of  an  asymptotic 
series  In  e.  In  [27]  this  Is  done  by  expanding  the  solution  kf  (T,x) 
In  an  asymptotic  series.  The  expansion  Is  valid  In  regions  where  the 
solution  W°(T,x)  of  the  corresponding  Hamilton- Jacobi  equation  is 
smooth.  In  [20]  Bensoussan  obtains  an  asymptotic  expansion,  using  a 
stochastic^ maximum  principle  Instead  of  (3.2). 
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6.  A  logarithmic  transformation .  Consider  a  linear  operator  of 
the  form  L  +  V(x),  where  L  Is  the  generator  of  a  Markov  process 
with  state  space  z.  The  Initial  value  problem 

(6.1)  af-U  +  V(x)4 

with  data  ♦(0,x)  «  *(x)  has  a  probabilistic  solution  by  a  well  known 
formula  of  Feynman-Kac  type.  For  positive  solutions  of  (6.1)  another 
probabilistic  representation  for  +(T,x)  can  often  be  found  In  the 
following  way.  The  logarithmic  transformation  I  *  -log$  changes  (1.1) 
Into  the  nonlinear  equation 

(6.2)  -  M(I)  -  V(x), 

(6.3)  H(I)  -  -e^e*1). 

If  one  can  find  a  control  problem  of  the  kind  In  s2  such  that 

(6.4)  H(I)  »  m1ntluI  +  k(x,u)J, 

uCU 

then  (6.2)  Is  the  dynamic  programming  equation  (3.2).  The  stochastic 
control  Interpretation  of  I(T,x)  Is  as  the  minimum  of  the  criterion 
J  In  (2.2).  Thus,  In  (3.1)  we  have  W  *  I.  For  a  nondegenerate  diffu¬ 
sion  obeying  the  stochastic  differential  equation 

(6.5)  det  -  b(ct)dt  +  c(et)dwt, 

a  Markov  control  policy  u(t,x)  changes  the  generator  L  to  L-, 
corresponding  to  change  of  drift  from  b(x)  to  u(t,x)  In  (6.5). 
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In  (2.2)  one  takes 

k(x,u)  -  j(b(x)  -  u)‘a_1(x)(b(x)  -  u), 

with  a  ■  00'.  An  appropriate  control  problem  for  the  case  of  et  a 
Jump  Markov  process  Is  described  In  C31 ] •  and  for  a  general  class  of 
Markov  In  Sheu's  thesis  [58].  The  change  of  generator  fron  L 
to  L-  corresponds  to  a  change  of  probability  measure.  It  was  pointed 
out  by  M.  Day  that  this  change  of  measure  results  by  conditioning  with 
respect  to  a(xT).  See  [31  ,(4.5)]. 

In  case  L  *  £&,  corresponding  to  it  a  brownian  motion  (6.1)  Is 
the  heat  equation  with  a  potential  term.  The  stochastic  control  Inter¬ 
pretation  of  S  *  -log  4  Is  as  least  average  action.  Upon  rescaling, 
taking  L  ■  and  replacing  V  by  x_1V,  the  usual  least  action  Is 
obtained  as  a  "classical  mechanical  limit"  as  x  0  [29].  The  heat 
equation  with  potential  Is  the  "Imaginary  time"  analogue  of  the 
SchrBdlnger  equation  of  quantum  mechanics.  There  Is  an  intriguing 
connection  between  stochastic  control  and  the  Schrodlnger  equation, 
whose  Implications  are  not  as  yet  well  understood  [36].  This  work  Is 
In  the  framework  of  Nelson's  stochastic  mechanics.  An  apparently 
different  theory  of  "stochastic  mechanics"  was  developed  by  Bismut  [4]. 

Holland  [39]  gave  a  stochastic  control  Interpretation  of  the  domi¬ 
nant  eigenvalue  of  the  Schrodlnger  equation  as  mlnlmimi  mean  total  energy 
of  a  particle  In  equilibrium.  The  approach  was  again  based  on  a  loga¬ 
rithmic  transformation  and  subsequently  led  to  Sheu's  treatment  (SB] 
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of  the  Donsker-Varadhan  formula  for  the  dominant  eigenvalue  of  the 
operator  L  +  V  appearing  In  (6.1). 

The  Ventsel  -  Freldlln  theory  of  large  deviations  deals  with  asymp¬ 
totic  probabilities  of  rare  events  associated  with  nearly  deterministic 
Harkov  processes.  The  logarithmic  transform  gives  another  approach  to 
results  of  this  kind.  As  an  Illustration  we  consider  the  problem  of 
exit  from  an  open  set  Dei  during  the  time  Interval  0  <_  t  <_  T.  Let 
be  a  Harkov  process  tending  to  a  deterministic  limit  x”  as 
e  -*■  0.  Let  Ie  *  -e  log  Px(t£  1  T),  where  re  Is  the  exit  time  of 
x*  from  0.  Under  various  assumptions  (Including  a  suitable  scaling 
of  e),  Ie  tends  to  a  limit  1°,  where  I°(T,x)  Is  the  minimum  of  a 
certain  "action  functional"  among  curves  starting  at  xCD  and  leaving 
D  by  time  T.  In  the  stochastic  control  approach  Ie(T,x)  Is  the 
minimum  performance  In  a  corresponding  stochastic  control  problem  [28] 
[31 ] [58] .  In  this  approach  a  minimum  principle  Is  associated  with  the 
large  deviation  problem  for  c  >  0,  not  just  in  the  limit  as  e  -*•  0. 

In  [32] ,  the  logarithmic  transformation  was  applied  to  solutions  to 
the  pathwlse  equation  of  nonlinear  filtering,  making  a  connection 
between  filtering  and  stochastic  control. 

7-.  Partial  observations:  adaptive  control.  The  states  xt  of 
a  stochastic  system  often  cannot  In  practice  be  measured  directly,  or 
perhaps  can  only  be  measured  with  random  errors.  This  has  ltd  to  an 
extensive  literature  or  nonlinear  filtering  and  on  optimal  control  under 
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partlal  observations.  For  controlled  diffusions,  «  standard,  model  is 
to  take  state  dynamics  (2.3)  and  an  observation  process  yt  governed 

(7.1)  yt  •  |o  h(xs)ds  +  Ht, 

with  N  a  brownian  notion  Independent  of  w.  The  Information  avail¬ 
able  to  the  controller  at  time  t  Is  usually  assuned  to  be  described 
by  the  a-algebra  5^  generated  by  observations  ys  for  s  <  t. 
However,  existence  of  optimal  controls  has  been  proved  only  with  a 
somewhat  wider  class  of  admissible  controls  than  those  adapted 
to  this  family 

Several  good  survey  articles  on  controlled  partially  observed 
diffusions  have  recently,  appeared  (151(161(17].  Hence,  we  shall  not 
try  to  summarize  the  various  results  here.  In  studying  partially 
observed  control  problems  It  Is  useful  to  Introduce  an  auxiliary 
"separated"  control  problem.  In  the  separated  problem  the  role  ef 
"state"  process  Is  taken  by  a  measure-valued  stochastic  process 
(34).  The  measure  ot  represents  an  unnormalized  conditional  dlstrl- 
tuition  of  xt  given  observations  and  controls  ys,«f,0  ^  s  <.  t.  A 
nonlinear  semigroup  fgr  the  controlled, , measure-valued  process  «t  has 
been  constructed  (111(30)1331 ,  bpong  other  recent  work,  we  mention  that 
ef  Klsbel  153]  on  partially  observed  junp.proeesses»  sAdofjfeuletto- 
Stplrglas  £49]  on  Inpul si ve  control  under  partial  Infonmetion.  s 
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Adaptlve  control.  In  adaptive  control  the  objective  Is  the  simul¬ 
taneous  control  and  Identification  of  unknown  system  parameters.  Common 
techniques  In  discrete-time  adaptive  control  involve  sequential  techni¬ 
ques.  based  on  maximum  likelihood  or  least  squares,  for  updating  esti¬ 
mates  of  unknown  parameters.  In  the  context  of  adaptive  control  of 
Markov  chains  see  the  pioneering  work  of  Mandl  [48],  also  Borkar- 
Varalya  [23],  Kumar-LIn  [41].  Another  (Bayesian)  viewpoint  Is  to  treat 
adaptive  control  of  Markov  processes  as  a  special  case  of  stochastic 
control  under  partial  observations.  This  Is  done  by  simply  regarding 
the  unknown  parameters  as  additional  (nontime-varying)  components  of 
the  system  state.  From  a  practical  standpoint  this  approach  encounters 
well  known  difficulties.  In  that  effective  solutions  to  partially 
observed  stochastic  problems  are  difficult  to  obtain.  Nevertheless, 
special  cases  In  which  the  problem  becomes  finite  dimensional  have 

been  treated  by  HI jab  [38]  and  Rlshel  [54]. 

* 

8.  Impulse  control; problems  with  switching  costs.  In  Impulse 
control  problems  the  control  actions  are  taken  at  discrete  (random) 
time  Instance,  and  each  control  action  leads  to  an  Instantaneous  change 
In  the  state  x^.  Typical  Impulse  control  problems  are  those  of  stock 
Inventory  management.  In  which  a  control  action  Is  to  reorder  with 
Immediate  delivery  of  the  order. 

The  analytic  treatment  of  Impulse  control  was  Initiated  and  developed 
systematically  by  Bensoussan  and  Lions  [3],  with  emphasis  on  the  control 


of  nondegenerate  diffusions.  The  dynamic  programming  equation  Is  re¬ 
placed  by  a  set  of  Inequalities  which  take  the  form  of  a  quasi variational 
Inequality.  For  the  case  of  degenerate  diffusions  see  Menaldl  [50],  and 
for  Impulsive  control  for  Harkov  Feller  process  see  Robin  [55] [56]. 
Lepeltler-Marchal  [43]  gave  a  probabilistic  treatment. 

Another  class  of  stochastic  control  problems  of  recent  Interest  are 
those  In  which  control  actions  are  taken  at  discrete  time  Instants,  with 
no  Instantaneous  change  In  but  with  a  cost  of  switching  control 
actions.  Such  problems  arise  In  the  theory  of  controlled  queues  (see 
Sheng  [57]  )  and  In  control  of  energy  generating  systems  under  uncertain 
demand.  The  analytical  treatment  again  Is  to  reduce  the  problem  to  a 
quasi variational  Inequality.  See  Lenhart-Belbas  [42],  Liao  [44]. 

9.  Applications.  Optimal  stochastic  control  theory  was  Initially 
motivated  by  problems  of  control  of  physical  devices.  More  recent 
Influences  have  come  from  management  science,  economics,  and  Information 
systems.  Until  now,  the  Impact  on  engineering  practice  of  much  of  the 
sophisticated  mathematical  theory  has  been  small.  The  stochastic  linear 
regulator  Is  a  standard  tool,  because  the  optimal  Markov  control  poli¬ 
cies  turn  out  to  be  linear  In  the  state  x.  If  the  Markov  policy  Is 
nonlinear.  It  Is  difficult  to  Implement.  Moreover,  other  Issues  may  be 
considered  In  practice  more  Important  than  optimality  of  system  perform¬ 
ance  as  predicted  by  the  stochastic  control  model.  The  model  Is  gener¬ 
ally  a  simplification  of  nature,  through  linearizations,  reductions  of 
dimensionality,  assumptions  that  noises  are  white,  etc.  A  control 
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whlch  performs  well  (even  optimally)  according  to  the  model  may  behave 
poorly  In  a  real  control  system.  The  question  of  robustness  of  controls 
with  respect  to  unmodelled  system  dynamics  Is  of  current  Interest  In  the 
engineering  control  literature.  See  for  example  [63].  A  different  sort 
of  question  Is  that  of  stochastic  controllability  [64]. 

We  conclude  by  mentioning  two  novel  applications  of  stochastic  con¬ 
trol.  One  Is  Arrow's  model  of  exploration  consumption,  and  pricing  of 
a  randomly  distributed  natural  resource.  This  model  was  analyzed  In 
detail  by  Hagan -Cafllsch- Keller  [37].  They  determined  approximately 
the  free  boundary  between  portions  of  the  state  space  where  new  explor¬ 
ation  should  or  should  not  be  undertaken. 

Ludwig  and  associates  have  applied  a  stochastic  control  method  to 
fishery  management  problems  [47].  The  fishery  resource  Is  controlled 
through  the  rate  at  which  fish  are  harvested.  This  work  has  an  Impor¬ 
tant  statistical  aspect  as  well  as  the  control  aspect,  since  errors  In 
measuring  unknown  parameters  In  the  fishery  model  can  be  Important. 
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